Exploration of Uncharted Regions of the Protein Universe

نویسندگان

  • Lukasz Jaroszewski
  • Zhanwen Li
  • S. Sri Krishna
  • Constantina Bakolitsa
  • John Wooley
  • Ashley M. Deacon
  • Ian A. Wilson
  • Adam Godzik
چکیده

The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds.

The "small molecule universe" (SMU), the set of all synthetically feasible organic molecules of 500 Da molecular weight or less, is estimated to contain over 10(60) structures, making exhaustive searches for structures of interest impractical. Here, we describe the construction of a "representative universal library" spanning the SMU that samples the full extent of feasible small molecule chemi...

متن کامل

Intrinsic map dynamics exploration for uncharted effective free-energy landscapes.

We describe and implement a computer-assisted approach for accelerating the exploration of uncharted effective free-energy surfaces (FESs). More generally, the aim is the extraction of coarse-grained, macroscopic information from stochastic or atomistic simulations, such as molecular dynamics (MD). The approach functionally links the MD simulator with nonlinear manifold learning techniques. The...

متن کامل

Charting an Unknown Protein Universe

On the surface, the protein universe seems dauntingly vast. Driven by the increasingly rapid accumulation of genomic sequences, the past few decades have yielded sequence data for several million gene products, leaving researchers struggling to keep up. Of the 10,000 protein families listed in the latest release from PFAM (http://pfam.sanger.ac.uk/), an online database that groups proteins base...

متن کامل

Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery

All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63 857 compounds of up to 11 atoms were found in pu...

متن کامل

Expression, purification, and immunization of a chimeric protein containing immunogenic regions of flagellin and intimin proteins against E. coli O157: H7

Introduction: Enterohemorrhagic Escherichia coli (EHEC) and serotype O157: H7 is one of the most important diseases causing diarrhea. Shiga-like toxin secreted by the bacteria destroys epithelial cells and, in acute cases, causes hemolytic uremic syndrome (HUS). Antibiotic therapy is not effective against this pathogen, because it increases the production of Shiga toxin. Designing chimeric immu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2009